Some of the read_csv() uses the local path method taught in Lecture 4, so you will have to hard wire the full path if you don’t follow best practices.
Let us revisit left_join and give you plenty of practice
I previously shared the image below in slide 33 of Lecture 5 last week to visually show you how left_join() works. We use left_join(data_frame1, dataframe2) to merge data from another data frame (table) into another data frame (table). We typically want to save the joined data into a data frame in our R environment.
So what you’ll typically see is: new_data_frame <- left_join(data_frame1, dataframe2)
There are many types of joins. But you may think of new_data_frame <- left_join(data_frame1, dataframe2) as supplementing data from data_frame2 into the underlying data of data_frame1 then saves the joined data into the R environment as new_data_frame. To learn more about joins (e.g., other joins such as inner_join), you may see a detailed lecture on joins I gave lecture last semester for DATA 412/612 here: https://jmtfeliciano.github.io/DATA412Fall2024/Data412612Lecture7
Important note:new_data_frame <- left_join(data_frame1, dataframe2) only works if and only if they share a common variable. Moreover, the common variable must also be of the same type (e.g., both are chr or both are numeric/double/integer).
Preliminaries Part 1
Run the script below to load the tidyverse package.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Practice 1: left_join() - 3 minutes
Important Note Above Code Blow: You don’t have to know tribble(). Just know I am creating data frames (one named comics_character, the other comics_publisher) into the R environment below
# A tibble: 7 × 3
name alignment gender
<chr> <chr> <chr>
1 Magneto bad male
2 Storm good female
3 Mystique bad female
4 Batman good male
5 Joker bad male
6 Catwoman bad female
7 Hellboy good male
# A tibble: 7 × 2
name publisher
<chr> <chr>
1 Magneto Marvel
2 Storm Marvel
3 Mystique Marvel
4 Batman DC
5 Joker DC
6 Catwoman DC
7 Hellboy Dark Horse Comics
Using the R chunk below, create comics_character_v2 after pulling the publisher data from comics_publisher into the underlying data for comics_character. Afterward, use head() to visually confirm that comics_character_v2 is created as expected Important note: Before typing your code below, visually confirm from the printed data frame data above that both data frames have (a) a shared variable with the same name and (b) that variable (also called the join key) has the same variable type in both data frames.
# Place your code belowcomics_character_v2 <-left_join(comics_character, comics_publisher)
Joining with `by = join_by(name)`
head(comics_character_v2, 10)
# A tibble: 7 × 4
name alignment gender publisher
<chr> <chr> <chr> <chr>
1 Magneto bad male Marvel
2 Storm good female Marvel
3 Mystique bad female Marvel
4 Batman good male DC
5 Joker bad male DC
6 Catwoman bad female DC
7 Hellboy good male Dark Horse Comics
Practice 2: left_join() - 3 minutes
Use the space below to load employees.csv and salaries.csv (both are under Week 6 on Canvas) using read_csv() into your R environment. Name these data frames however you want as long it follows best practice. Make sure to use head() for both data frames so you can study the underlying data.
# Place your code belowemployees <-read_csv("data/employees.csv")
Rows: 10 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Name
dbl (2): EmployeeID, Age
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(employees)
# A tibble: 6 × 3
EmployeeID Name Age
<dbl> <chr> <dbl>
1 101 Alice 25
2 102 Bob 30
3 103 Charlie 35
4 104 David 40
5 105 Emma 28
6 106 Frank 33
salaries <-read_csv("data/salaries.csv")
Rows: 10 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): City
dbl (2): EmployeeID, Salary
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(salaries)
# A tibble: 6 × 3
EmployeeID City Salary
<dbl> <chr> <dbl>
1 101 New York 60000
2 102 Los Angeles 65000
3 104 Chicago 70000
4 106 Houston 62000
5 107 Phoenix 67000
6 108 Philadelphia 68000
Next, using left_join() to combine salary data into the employees data. Save the resulting data frame into your R environment and you may name it however you want as long the name follows best practice. Important note (again): Before typing your code below, visually confirm from the printed data frame data above that both data frames have (a) a shared variable with the same name and (b) that shared variable (also called the join key) has the same variable type in both data frames.
# Now that we have confirmed join key is EmployeeID and they're the same type (dbl)combined_data <-left_join(employees, salaries)
Joining with `by = join_by(EmployeeID)`
head(combined_data)
# A tibble: 6 × 5
EmployeeID Name Age City Salary
<dbl> <chr> <dbl> <chr> <dbl>
1 101 Alice 25 New York 60000
2 102 Bob 30 Los Angeles 65000
3 103 Charlie 35 <NA> NA
4 104 David 40 Chicago 70000
5 105 Emma 28 <NA> NA
6 106 Frank 33 Houston 62000
Join key have different names
Suppose we want to left_join() two data frames called data_frame1 and data_frame2. But the join key, let’s say is social security number, differs in spelling. How do you handle this? Before answering this, let us look at the data again for employee_ssn and salaries_ssn below:
Rows: 10 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): social_security_number, Name
dbl (1): Age
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 10 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): ssn, City
dbl (1): Salary
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(employee_ssn)
# A tibble: 6 × 3
social_security_number Name Age
<chr> <chr> <dbl>
1 123-45-6789 Alice 25
2 234-56-7890 Bob 30
3 345-67-8901 Charlie 35
4 456-78-9012 David 40
5 567-89-0123 Emma 28
6 678-90-1234 Frank 33
head(salaries_ssn)
# A tibble: 6 × 3
ssn City Salary
<chr> <chr> <dbl>
1 123-45-6789 New York 60000
2 234-56-7890 Los Angeles 65000
3 456-78-9012 Chicago 70000
4 678-90-1234 Houston 62000
5 789-01-2345 Phoenix 67000
6 890-12-3456 Philadelphia 68000
What error do you get when you have no common variable? Run the script below to see the error mesage:
left_join(employee_ssn, salaries_ssn)
Error in `left_join()`:
! `by` must be supplied when `x` and `y` have no common variables.
ℹ Use `cross_join()` to perform a cross-join.
An easy remedy: pipe one of the read_csv() into rename() so both data frames have the same name for the appropriate key join.
Practice 3 (3 mins)
Modify the code below by incorporating or piping one of the read_csv() statements into rename() to enable the left_join() below:
# BEFORE WE MADE CHANGESemployee_ssn <-read_csv("https://raw.githubusercontent.com/jmtfeliciano/teachingdata/refs/heads/main/employees_ssn.csv") |>rename(ssn = social_security_number)salaries_ssn <-read_csv("https://raw.githubusercontent.com/jmtfeliciano/teachingdata/refs/heads/main/salaries_ssn.csv")head(employee_ssn)head(salaries_ssn)employee_salary_data <-left_join(employee_ssn, salaries_ssn)head(employee_salary_data)
# AFTER WE MADE THE CHANGEemployee_ssn <-read_csv("https://raw.githubusercontent.com/jmtfeliciano/teachingdata/refs/heads/main/employees_ssn.csv") |>rename(ssn = social_security_number)
Rows: 10 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): social_security_number, Name
dbl (1): Age
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 10 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): ssn, City
dbl (1): Salary
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(employee_ssn)
# A tibble: 6 × 3
ssn Name Age
<chr> <chr> <dbl>
1 123-45-6789 Alice 25
2 234-56-7890 Bob 30
3 345-67-8901 Charlie 35
4 456-78-9012 David 40
5 567-89-0123 Emma 28
6 678-90-1234 Frank 33
head(salaries_ssn)
# A tibble: 6 × 3
ssn City Salary
<chr> <chr> <dbl>
1 123-45-6789 New York 60000
2 234-56-7890 Los Angeles 65000
3 456-78-9012 Chicago 70000
4 678-90-1234 Houston 62000
5 789-01-2345 Phoenix 67000
6 890-12-3456 Philadelphia 68000
# A tibble: 6 × 5
ssn Name Age City Salary
<chr> <chr> <dbl> <chr> <dbl>
1 123-45-6789 Alice 25 New York 60000
2 234-56-7890 Bob 30 Los Angeles 65000
3 345-67-8901 Charlie 35 <NA> NA
4 456-78-9012 David 40 Chicago 70000
5 567-89-0123 Emma 28 <NA> NA
6 678-90-1234 Frank 33 Houston 62000
Join key have different types
Before proceeding, run the following script which will will import two new data frames named employee_ssn_v2 and salaries_ssn_v2 into your R environment. Pay close attention to the social security number column and their data types for each data frame.
Note: You don’t really have to know what load() does, just that I may ask you to run a similar code for a future class, homework, or exam which allows you to import multiple data frames at the same time.
# A tibble: 6 × 3
social_security_number Name Age
<dbl> <chr> <dbl>
1 123456789 Alice 25
2 234567890 Bob 30
3 345678901 Charlie 35
4 456789012 David 40
5 567890123 Emma 28
6 678901234 Frank 33
head(salaries_ssn_v2)
# A tibble: 6 × 3
social_security_number City Salary
<chr> <chr> <dbl>
1 123456789 New York 60000
2 234567890 Los Angeles 65000
3 456789012 Chicago 70000
4 678901234 Houston 62000
5 789012345 Phoenix 67000
6 890123456 Philadelphia 68000
What error do you get when you have no common variable? Run the script below to see the error message:
left_join(employee_ssn_v2, salaries_ssn_v2)
Joining with `by = join_by(social_security_number)`
Error in `left_join()`:
! Can't join `x$social_security_number` with `y$social_security_number`
due to incompatible types.
ℹ `x$social_security_number` is a <double>.
ℹ `y$social_security_number` is a <character>.
Again the issue: social_security_number in employee_ssn_v2 is double (which is numeric) while social_security_number in salaries_ssn_v2 is character or string.
Two ways to fix this:
(a) Create another data frame that we will use for the merging after forcing social_security_number from employee_ssn_v2 into character type by using:
|>
mutate(social_security_number = as.character(social_security_number)) or
(b) Create another data frame that we will use for the merging after forcing social_security_number from salaries_ssn_v2 into numeric type by using:
Let us implement one of the fixes together as a class. Remember, employee_ssn_v2 and salaries_ssn_v2 exist already. And goal is to make left_join() eventually work.
# In class practice togethersalaries_corrected <- salaries_ssn_v2 |>mutate(social_security_number =as.numeric(social_security_number))joined_data <-left_join(employee_ssn_v2, salaries_corrected)
Joining with `by = join_by(social_security_number)`
head(joined_data)
# A tibble: 6 × 5
social_security_number Name Age City Salary
<dbl> <chr> <dbl> <chr> <dbl>
1 123456789 Alice 25 New York 60000
2 234567890 Bob 30 Los Angeles 65000
3 345678901 Charlie 35 <NA> NA
4 456789012 David 40 Chicago 70000
5 567890123 Emma 28 <NA> NA
6 678901234 Frank 33 Houston 62000
Map making recipe in Lecture 5
We need a shapefile file which is a digital format for storing geographic location and associated attribute information (e.g., points, lines, or polygons). Think of boundaries, shapes, geometric information to delineate locations and boundaries. We will use the tigris package to get the shapefiles directly from the US Census Bureau, so we don’t have to work and manually load actual shapefiles (.shp) into R. Shapefiles loaded into R from tigris are both sf (simple feature which is geospatial data in R) and data frames.
We need data to map we are interested in. For example, use read_csv() to read data you want to plot.
Merge data (e.g., SVI data) to the shapefile (which is also a data frame) via left_join().
Additional Note about the shapefile: The join key for the shapefile from the tigris package will always be GEOID of character (or string) type. If there is a mismatch on the join key (i.e., different spelling for join key, join key having different type), pipe the shapefile or read_csv() into (a) rename() or (b) mutate() with either as.character() or as.numeric(). If you are loading csv files into R, GEOID or FIPS will almost always be read as a numeric type into R, so either (i) convert GEOID from the shapefile into numeric type or (ii) convert GEOID from the file you read into character type.
Leverage ggplot2 package to render the map by using geom_sf().
Preliminaries Part 2
Run the script below to load the tigris package:
library(tigris)
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
Map 1: census tract data for a particular state
Our goal for our first map is to create a map for the 2019 social vulnerability for MO.
Step 1 of Map 1: use tracts() to get tract-level shapefile for the state of MO for 2019.
dc_tract_shp <-tracts(year =2019, state ="MO") # important note: shapefile R objects don't show you variable/column data type via head() so you must use glimpse() for thisglimpse(dc_tract_shp)
Rows: 1388 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): STATE, ST_ABBR, COUNTY, LOCATION
dbl (8): ST, STCNTY, FIPS, RPL_THEME1, RPL_THEME2, RPL_THEME3, RPL_THEME4, R...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(mo_svi_data)
# A tibble: 6 × 12
ST STATE ST_ABBR STCNTY COUNTY FIPS LOCATION RPL_THEME1 RPL_THEME2
<dbl> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 29 MISSOURI MO 29189 St. Louis 2.92e10 Census … 0.0123 0.067
2 29 MISSOURI MO 29095 Jackson 2.91e10 Census … 0.0007 0.130
3 29 MISSOURI MO 29095 Jackson 2.91e10 Census … 0.018 0.0252
4 29 MISSOURI MO 29095 Jackson 2.91e10 Census … 0 0.189
5 29 MISSOURI MO 29189 St. Louis 2.92e10 Census … 0.0079 0.219
6 29 MISSOURI MO 29095 Jackson 2.91e10 Census … 0.0346 0.0159
# ℹ 3 more variables: RPL_THEME3 <dbl>, RPL_THEME4 <dbl>, RPL_THEMES <dbl>
Step 3 of Map 1: use left_join() to add data you loaded in R into the shapefile. Modify dc_tract_shp and mo_svi_data to enable join. Note: In this example, we piped mo_svi_data into rename(GEOID = FIPS) because the join key in the shapefile is GEOID, so we renamed FIPS into GEOID within the mo_svi_data data for the to enable the join.
# converts GEOID in the shapefile into numeric type instead of characterdc_tract_shp_corrected <- dc_tract_shp |>mutate(GEOID =as.numeric(GEOID))# renames FIPS into GEOID to enable joinmo_svi_data_corrected <- mo_svi_data |>rename(GEOID = FIPS) # perform joindc_tract_shp_with_data <-left_join(dc_tract_shp_corrected, mo_svi_data_corrected)
Step 4 of Map 1: use ggplot() to plot map. Again, we want to plot the values of RPL_THEMES. Best to add theme_bw(), theme_minimal(), or theme_void() customization.
A condense version of the code above where corrections are done via piping (preferred):
library(tigris)# Save shapefile and makes GEOID numeric instead of character typedc_tract_shp <-tracts(year =2019, state ="MO") |>mutate(GEOID =as.numeric(GEOID))# Loads SVI data and rename FIPS into GEOID to enable joinmo_svi_data <-read_csv("https://raw.githubusercontent.com/jmtfeliciano/teachingdata/refs/heads/main/MissouriSVI2019.csv") |>rename(GEOID = FIPS)
Rows: 1388 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): STATE, ST_ABBR, COUNTY, LOCATION
dbl (8): ST, STCNTY, FIPS, RPL_THEME1, RPL_THEME2, RPL_THEME3, RPL_THEME4, R...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Perform actual joindc_tract_shp_with_data <-left_join(dc_tract_shp, mo_svi_data)
For this class, your goal is to read dc_household_income_2023.csv and plot median_income which is the household median for income in DC for 2023. Use the R chunk below to create your map. You may additional R chunk as needed if desired or simply use the one space below to do all four steps of the map-making recipe. Reminder: When you load data via read_csv(), always check whether or not the join key is GEOID or called something else. You only need to pipe read_csv() into rename() if and only if the desired join key is not GEOID and is called something else. Also: please use another color gradient (don’t use white and blue4). Please see page 3 of https://www.nceas.ucsb.edu/sites/default/files/2020-04/colorPaletteCheatsheet.pdf for available colors.
# Place your code below# Step 1:dc_shapefile <-tracts(year =2023, state ="DC") |>mutate(GEOID =as.numeric(GEOID))glimpse(dc_shapefile)# Step 2:income_data <-read_csv("data/dc_household_income_2023.csv")head(income_data)# Step 3dc_shapefile_with_data <-left_join(dc_shapefile, income_data)glimpse(dc_shapefile_with_data)# Step 4ggplot(dc_shapefile_with_data, aes(fill = median_income)) +geom_sf() +theme_void() +scale_fill_gradient(low ="coral3", high ="slategray1") +labs(fill ="Household Median\nIncome For DC 2023")# \n tells R to bring the rest of the legend to the next line
Rows: 201 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): variable
dbl (3): GEOID, median_income, moe
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
You need to follow the same recipe but this time, we use counties(year = [year], state = "[state abbreviation]") for the shape file.
Our goal for our first map is to create a county-level Virginia (VA) map for the county-level median household income using data from va_household_income_2020.csv
This time, we need to plot estimate, which represents the county-level data median income for VA.
Trying new (optional) customization this time: Let us add these new customization (a) add labels = scales::comma inside scale_fill_gradient() and (b) + theme(legend.position = c(0.2, 0.8)) at the bottom of the ggplot code then see what they do! Note: theme(legend.position = c(0.2, 0.8)) shifts the legend from the lower left corner of the image by 0.20 ‘units’ to the right (think of x-axis in Math) and 0.8 ‘units’ higher Again, these are optional.
# Step 1: Load shapefile # Note: We can just pipe any corrections into mutate() directlyva_shapefile <-counties(year =2020, state ="VA") |>mutate(GEOID =as.numeric(GEOID))# Step 2: Load VA county-level household income for 2020. va_income_data2020 <-read_csv("data/va_household_income_2020.csv")head(va_income_data2020)# Step 3: Perform the joinva_shapefile_with_data <-left_join(va_shapefile, va_income_data2020)# Step 4: Create the mapggplot(va_shapefile_with_data, aes(fill = estimate)) +geom_sf() +theme_void() +scale_fill_gradient(low="white", high="blue4", labels = scales::comma) +labs(fill='VA Median Income') +theme(legend.position =c(0.2, 0.8))
Rows: 133 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): variable
dbl (3): GEOID, estimate, moe
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Warning: A numeric `legend.position` argument in `theme()` was deprecated in ggplot2
3.5.0.
ℹ Please use the `legend.position.inside` argument of `theme()` instead.
Pre-practice
Last week, we did a demo of the Small Area Health Insurance Estimates (SAHIE) dashboard where I quickly did a demo of downloading data from:
https://www.census.gov/data-tools/demo/sahie/#/
Let us quickly visit the website and download county-level data for Maryland (MD) for 2022.
Practice 5 (10 mins)
Data pre-processing before R: After downloading the file into your computer, open the csv file in your computer and delete the first few rows until the row of variable names is the first row. Note: The uninsured rate is the column called % .
R hates symbols for column names (makes it difficult to work with in R), so before saving the file, rename % into something meaning (e.g., uninsured_rate). R also hates white spaces in column or variable names (again, makes it difficult to work with in R), so delete white spaces from column names. For Mac users, saving your file via File > Save in Numbers (the default program for csv files in Mac) does not make a csv and actually creates what is called a Numbers spreadsheet, so you have to save the new csv file under File > Export To > CSV
# Place your code belowmd_shp <-counties(year =2022, state ="MD") |>mutate(GEOID =as.numeric(GEOID))md_uninsured_data <-read_csv("data/SAHIE_02-18-2025.csv") |>rename(GEOID = ID)md_shp_with_data <-left_join(md_shp, md_uninsured_data)head(md_shp_with_data )ggplot(md_shp_with_data, aes(fill = uninsured_rate)) +geom_sf() # add customizations yourself later
New names:
Rows: 25 Columns: 9
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): Name dbl (6): Year, ID, Margin of Error...5, uninsured_rate, % Margin of
Error, M... num (2): Number...4, Number...8
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining with `by = join_by(GEOID)`
• `Number` -> `Number...4`
• `Margin of Error` -> `Margin of Error...5`
• `Number` -> `Number...8`
• `Margin of Error` -> `Margin of Error...9`
Rows: 52 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): fips_code, variable
dbl (2): estimate, moe
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining with `by = join_by(GEOID)`
When you are mapping the entire country, we typically scale then place Alaska, Hawaii, and Puerto Rico under the contiguous United States. We can do this in R by piping state() into shift_geometry().shift_geometry() only works for those three states/territories, but not for other territories such as Guam. Therefore, we need to remove them using filter()
Note: I am introducing the ! notation inside filter() so it is filtering for the opposite thing.
Rows: 52 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): fips_code, variable
dbl (2): estimate, moe
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining with `by = join_by(GEOID)`
Practice 6 (10 mins)
Plot data from breast_cancer_state_mortality_2023.csv to show the mortality rate for states across the US in 2023. The variable you is deaths_per_100000.
## WILL BE ASSIGNED FOR HOMEWORK
Plotting categorical data instead of numerical data
Suppose we want to map to indicate where states have expanded Medicaid access for their citizens as of December 2023. Let us view the data below before plotting:
nationwide2023 <-states(year =2023) |>mutate(GEOID =as.numeric(GEOID)) |>filter(!NAME %in%c("American Samoa", "Commonwealth of the Northern Mariana Islands", "Commonwealth of the Northern Mariana Islands", "Guam", "United States Virgin Islands")) |>shift_geometry()medicaid_expansion <-read_csv("data/medicaid_expansion.csv")head(medicaid_expansion)nationwide2023_with_data <-left_join(nationwide2023, medicaid_expansion)head(nationwide2023_with_data)
Rows: 51 Columns: 3
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): StateName, medicaid_expansion
dbl (1): GEOID
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Simple feature collection with 6 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -91843.51 ymin: -1354601 xmax: 2042471 ymax: 1323908
Projected CRS: USA_Contiguous_Albers_Equal_Area_Conic
REGION DIVISION STATEFP STATENS GEOID GEOIDFQ STUSPS NAME LSAD
1 3 5 54 01779805 54 0400000US54 WV West Virginia 00
2 3 5 12 00294478 12 0400000US12 FL Florida 00
3 2 3 17 01779784 17 0400000US17 IL Illinois 00
4 2 4 27 00662849 27 0400000US27 MN Minnesota 00
5 3 5 24 01714934 24 0400000US24 MD Maryland 00
6 1 1 44 01219835 44 0400000US44 RI Rhode Island 00
MTFCC FUNCSTAT ALAND AWATER INTPTLAT INTPTLON
1 G4000 A 62266499712 489003081 +38.6472854 -080.6183274
2 G4000 A 138963763779 45970528648 +28.3989775 -082.5143005
3 G4000 A 143778366814 6216688589 +40.1028754 -089.1526108
4 G4000 A 206244555303 18937471947 +46.3159573 -094.1996043
5 G4000 A 25151736098 6979330958 +38.9466584 -076.6744939
6 G4000 A 2677763372 1323686976 +41.5964850 -071.5264901
StateName medicaid_expansion geometry
1 West Virginia Adopted MULTIPOLYGON (((1548777 354...
2 Florida Not Adopted MULTIPOLYGON (((1318717 -13...
3 Illinois Adopted MULTIPOLYGON (((701650.5 11...
4 Minnesota Adopted MULTIPOLYGON (((50386.4 128...
5 Maryland Adopted MULTIPOLYGON (((1718035 379...
6 Rhode Island Adopted MULTIPOLYGON (((2002871 667...
Important note for plotting categorical data: Do not use scale_fill_gradient() as we have been doing since that is for plotting numerical values. Let us plot our image before customizing our map.
How do you customize color? First determine how many unique values you have (e.g., using distinct()) and use scale_fill_manual(values = c([your colors here)) In the case of medicaid expansion, there are only two options. In this example, let us use scale_fill_manual(values = c("cornflowerblue", "darkslategrey")) where we specify two colors:
ggplot(nationwide2023_with_data,aes(fill = medicaid_expansion)) +geom_sf() +theme_void() +scale_fill_manual(values =c("cornflowerblue", "darkslategrey")) +labs(fill='Medicaid Expansion\nAs of Dec 2023')
# ONE BRIEF NOTE: within labs(), I used \n.# \n tells R you want the rest on the next line
Note: if the lack of data for Puerto Rico is bothersome to you, feel free to add “Puerto Rico” inside filter(!NAME %in% c([list here])). You might have to do this often because often, national data sets exclude Puerto Rico.
Practice 7 (10 mins)
Plot data from presidential_election1964.csv to plot which states won by Barry Goldwater vs Lyndon B. Johnson in the 1964 election. Victor is the variable you’d want to plot here. Note: the map of states hasn’t really changed since 1959 when Hawaii because the 50th state. Also, states(year = 1964) will not work since it does not exist in the API where tigris pulls the data from, please use any year for the state map in this practice (e.g., states(year = 2020))
## WILL BE ASSIGNED FOR HOMEWORK
Package for ‘Dark Mode’ images: ggdark
You may use the ggdark package as demonstrated below. This works for all ggplot (not just maps).
Instead of using theme_minimal() or theme_void(), you need to use dark_theme_minimal() or dark_theme_void().
# run install.packages("ggdark") in console if not installedlibrary(ggdark)ggplot(nationwide2023_with_data, aes(fill = medicaid_expansion)) +geom_sf() +dark_theme_minimal() +labs(fill='Medicaid Expansion\nAs of Dec 2023')
Inverted geom defaults of fill and color/colour.
To change them back, use invert_geom_defaults().